Bounded Model Checking of Multi-threaded Software 

using SMT solvers 



Lucas Cordeiro Bernd Fischer 

University of Soutlnampton University of Soutlnampton 

Icc08r(a)ecs. soton.ac.uk b.fischer@ecs.soton.ac.uk 



ABSTRACT 

The transition from single-core to multi-core processors has 
made multi-threaded software an important subject in com- 
puter aided verification. Here, we describe and evaluate an 
extension of the ESBMC model checker to support the veri- 
fication of multi-threaded software with shared variables and 
locks using bounded model checking (BMC) based on Sat- 
isfiability Modulo Theories (SMT). We describe three ap- 
proaches to model check multi-threaded software and our 
modelling of the synchronization primitives of the Pthread 
library. In the lazy approach, we generate all possible in- 
terleavings and call the BMC procedure on each of them 
individually, until we either find a bug, or have systemat- 
ically explored all interleavings. In the schedule recording 
approach, we encode all possible interleavings into one sin- 
gle formula and then exploit the high speed of the SMT 
solvers. In the underapproximation-widening approach, we 
reduce the state space by abstracting the number of state 
variables and interleavings from the proofs of unsatisfiabil- 
ity generated by the SMT solvers. In all three approaches, 
we use partial-order reduction (POR) techniques to reduce 
the number of interleavings explored. Experiments show 
that our approaches can analyze larger problems and sub- 
stantially reduce the verification time compared to state-of- 
the-art techniques that combine classic POR methods with 
symbolic algorithms and others that implement the Counter- 
Example Guided Abstraction Refinement technique. 
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1. INTRODUCTION 

Embedded computer systems are used in a wide range of 
sophisticated applications, such as mobile phones or set-top 
boxes providing internet connectivity. The functionality de- 
manded in such applications has increased significantly and 
an increasing number of functions are implemented in soft- 
ware rather than hardware. Thus, multi-core processors with 
scalable shared memory have become popular in embedded 
systems. In turn, the verification of the software design and 
the correctness of its multi-threaded implementations has 
become increasingly difficult. 

Bounded model checking (BMC) has already been success- 
fully applied to verify embedded software and discover subtle 
errors in real designs 6,. BMC generates verification condi- 
tions (VCs) that refiect the exact path in which a statement 
is executed, the context in which a given function is called, 
and the bit-accurate representation of the expressions. Prov- 
ing the validity of these VCs remains a major performance 
bottleneck in verifying embedded software, despite attempts 
to cope with increasing system complexity by applying SMT 
(Satisfiability Modulo Theories) [5l|l[ig. 

Recently, there have been attempts to extend BMC to 
the verification of multi-threaded software [141 1171 1181 124j . 
The main challenge is the state space explosion problem, 
as the number of interleavings grows exponentially with the 
number of threads and program statements. However, two 
important observations help us: (i) SMT-based BMC finds 
counter-examples very quickly [9] and (ii) SMT solvers pro- 
duce unsatisfiable cores that allow us to remove logic that is 
not relevant to a given property |20) . Grumberg et al. |16] 
realized that the unsatisfiable cores generated by the solvers 
can also be used to control the number of allowed interleav- 
ings of the given set of processes. They propose an algorith- 
mic method based on Boolean Satisfiability (SAT) and BMC 
to model check a multi-process system based on a series of 
under-approximated models. However, this method does not 
combine classic partial-order reduction (POR) methods with 
symbolic algorithms, which limits its usefulness for analyz- 
ing and verifying multi-threaded software. It has also not 
been applied in conjunction with SMT solvers. 

In our prior work [5], we extended the encodings from 
previous SMT-based bounded model checkers (5] [13] to pro- 
vide more accurate support for variables of finite bit width, 
bit-vector operations, arrays, structures, unions and point- 



ers. Here, we develop three approaches to tackle complexity 
problems in model checking multi-threaded C software. In 
the lazy approach, we extend the BMC procedure of single- 
threaded software to multi-threaded software by wrapping it 
inside a straightforward generate-and-test loop, which gen- 
erates all possible interleavings and calls the BMC proce- 
dure on each of them individually. We stop this loop either 
when we find a bug, or have systematically explored all inter- 
leavings. In the scheduling recording approach, we explore 
systematically the control-flow graph (CFG) of each thread 
and encode all the possible execution paths into one single 
formula, which is then fed into the back-end SMT solver. 
In our third approach, we extend the under-approximation 
and widening (UW) algorithm proposed in [16] with the pur- 
pose of addressing the verification of real- world C code using 
different background theories and SMT solvers. 

We also implement partial order reduction algorithms [3] 
in our three approaches and propose a comprehensive SMT- 
based BMC procedure to support the checking of multi- 
threaded programs that utilize the synchronization primi- 
tives of the POSIX Pthread Library [21]. To our knowl- 
edge, this work marks the first application of the UW algo- 
rithm combined with POR techniques to model check non- 
trivial multi-threaded C software. Experiments obtained 
with ESBMC show that our approaches can analyze larger 
problems and substantially reduce the verification time com- 
pared to state-of-the-art techniques that combine classic POR 
methods with symbolic algorithms and others that imple- 
ment the Counter- Example Guided Abstraction Refinement 
(CEGAR) technique. 



2. BOUNDED MODEL CHECKING OF 
MULTI-THREADED SOFTWARE 

In BMC, the program to be analyzed is modelled as a state 
transition system, which is built by extracting its behaviour 
from the CFG. This graph is used as part of a translation 
process from program text to single static assignment (SSA) 
form. Each thread is modelled as a CFG where nodes rep- 
resent program statements and edges represent transitions. 
A state transition system M = (S, R, So) is an abstract ma- 
chine that consists of a set of states S (where So S rep- 
resents the set of initial states) and transitions R between 
states, i.e., for each j £ R, ^ C S x S. A state s G S con- 
sists of the value of the program counter pc and the values of 
all program variables. An initial state so assigns the initial 
program location of the CFG to the pc. We identify each 
transition 7 = (si, Si+i) between two states Si and Si+i with 
a logical formula 7(si, Si+i) that captures the constraints on 
the values of the program counter and the program variables. 

As a running example, we consider the C program in Fig- 
ure [TJ which consists of two threads that are created using 
the Pthread library |21j. Note that our example contains a 
subtle bug (array lower bound) in line 9, because function 
nondet_uint{) might return non-deterministically a negative 
integer number and as a result the assert macro (line 10) 
fails. Figure [2] shows the CFG representation of the two 
threads Tx and Ty- After creation, they are at the con- 
trol points Txq and Tyq respectively, and since x == 2 (see 
line 3 of Figure [T]), both tests x > 2 and a; > 3 are false. 
If we schedule Tx first, it will not be enabled, and we can 
transition to the next state only by switching to Ty and ex- 
ecuting only the program statement Yi (i.e., a; = 3, see line 



18) before terminating. If we continue exploring the remain- 
ing interleavings, we schedule TV first, and the execution of 
Yi makes the test a; > 2 in line 7 true, thus enabling Tx 
to progress and transition through Xo and Xi, i.e., we ex- 
ecute program statements a; = 3, a[i] = *{{int*)arg), and 
assert{i >= && i < N)). Note that, as in [14] we do 
not model context switches inside the execution of individ- 
ual statements, to avoid exploring additional interleavings. 
This approach is safe as long as statements only read or write 
a single global variable, but is an under-approximation to 
programs that contain statements involving multiple global 
variables. However, with the benchmarks that are publicly 
available, we have not encountered any problems in practice. 
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^include <pthread . h> 






2 


#define N 10 






3 


int a[N] , i , j=l, x = 2; 






4 


int nondet_uint {); 






5 


void *Tx( void =i= arg ) 






6 


{ 






7 

S 


if (x>2) 
{ 






9 


a[i] = *((int *)arg) 


//XO 




10 


assert ( i>=0 && i<N); //XI 




11 


} 






12 


} 






13 


void *Ty(void *arg) 






14 


{ 






15 


if (x>3) 






16 


a[j] = *((int *)arg) 


//YO 




17 


else 






IS 


x = 3; 


//YI 




19 


} 






20 


int main() 






21 


{ 






22 


pthread_t idl , id2 ; 






23 


int argl=10, arg2=20 






24 


i=nondet_uint (); 






25 


pthread_create(&idl , 


NULL, Tx, 


&argl ) ; 


26 


pthread_create(&id2 , 


NULL, Ty, 


&arg2 ); 


27 


} 







Figure 1: A multi-threaded C program with violated 
property. 



^start_threadJ 



START_THREAD 




TyO: x>3 



Tx1 : a[i] = 10; 




Tyl: a[i] = 20; 



Ty2: x=3: 



Tx2; assert(i> = && i<N); 




'iD_THREAD j 



END_THREAD 



Figure 2: Control-flow graph of two threads. 

Formally, given a transition system M, a property 4>, and 
a bound k, BMC unrolls the system k times and translates 
it into a verification condition i/; such that i/; is satisfiable iff 



4> has a counterexample of depth less than or equal to k. The 
model checking problem associated with SMT-based BMC 
for checking linear-time temporal logic (LTL) properties is 
then formulated by constructing the logical formula: 

constraints property 
n k—1 n 

= I (so) A /\ /\ 7. (s., s.+i) A /\ {su) (1) 

where Pj (s^) represents a LTL property in step fc of 
thread j, I is the function for the set of initial states of 
M and 7j (si, Si+i) is the function of the transition relation 
of thread j at time steps i and i + 1. Hence, the formula 
'Jj (si,Si+i) represents the set of all executions 
of n threads up to the length k or less. Pj (sk) is derived 
from the property being checked and represents the condi- 
tion that it is violated by a bounded execution of thread j of 
length k or less. Note that formula ^ encodes all allowed 
interleavings of the given threads. 

2.1 Lazy Approach 

Conceptually, the simplest way to extend a bounded model 
checker for single-threaded software to the multi-threaded 
case is to wrap it inside a straightforward generate-and-test 
loop: we just need to generate all possible interleavings and 
call the BMC procedure on each of them individually, until 
we either find an error, or have systematically explored all 
interleavings. 




(b) 



Figure 3: (a) All possible thread interleavings in 
Figure [2] (b) The actual thread interleavings after 
using information from the front-end. 

On the face of it, this seems to be naive: the number of 
interleavings can grow very quickly (see Figure El^a) for all 
possible interleavings in the running example), and we need 
to invoke the model checker several times, which might slow 
down the verification process. 

However, there are several observations that make this 
approach worthwhile. First we can obviously generate each 
interleaving, model check it, and stop the generation when 
we find the first error. In practice, if the program contains 
any errors, they will be exhibited in a substantial fraction of 
the interleavings, if not all (experience of 23 for real appli- 
cations), so that we only need to explore a small part of the 



search space. Second, we obviously do not need to gener- 
ate the source code of all possible interleavings. Instead, we 
keep in memory the nodes of all unexplored execution paths 
and expand them one path at a time. We then construct the 
VCs for the chosen execution path according to formula ([T)) 
and feed it into the SMT solver to check for satisfiability. 
Third, and most important, we can use information from 
the front-end to reduce both the number of interleavings to 
be explored and the size of the formulas sent to the SMT 
solver. In particular, during the symbolic execution we ex- 
ploit which transitions are enabled in a given state to drive 
the exploration of the interleavings. 

In our running example, the transitions from Txq to Txi 
and from Tyq to Ty^ are disabled because we initially have 
X = 2. This rules out all interleavings that start with ei- 
ther Xo or Yo and only leaves those that bypass Tx entirely, 
or start with Yi. Assuming that we explore the thread Tx 
first, in the first iteration we thus build the VCs only for 
the program statement Yi. We then pass the formula ([T]) 
to the SMT solver and check its satisfiability. If it is satis- 
fiable, we have found a property violation and we can stop 
the process. Here, however, it is not satisfiable and we con- 
tinue to the next iteration by selecting an unexplored path. 
In the second iteration, we explore Ty first, and select pro- 
gram statement Yi , and after that we explore Tx and select 
program statements Xo and Xi. Again, we pass the corre- 
sponding version of formula ((ij to the SMT solver. Since 
this is now satisfiable, we can stop the exploration of the 
execution paths. 

In summary, we guide the symbolic execution between the 
threads and systematically explore all the possible execution 
paths in a lazy way. This approach can find bugs fast, but 
as the front-end might invoke the SMT solver repeatedly, 
once for each possible execution path, it can suffer perfor- 
mance degradation, in particular for correct programs where 
we need to explore all possible interleavings. The invocation 
procedure itself is slow and the formula needs to be passed 
from front-end to back-end several times. Moreover, exe- 
cution paths that share the same program statements will 
be unnecessarily checked several times. However, as each 
formula corresponds to one possible path only, its size is rel- 
atively small compared to the schedule recording approach 
described in the next section and can thus be handled easily 
by the SMT solver without requiring too much memory. 

2.2 Schedule Recording Approach 

State-of-the-art SMT solvers are built on top of SAT solvers 
to speed up the performance by exploiting the support for 
"conflict clauses" and non-chronological backtracking [25) . 
In the schedule recording approach we leverage this and 
avoid invoking the SMT solver multiple times. We use the 
symbolic execution engine as before to systematically ex- 
plore the interleavings, but now we add schedule guards to 
record in which order the scheduler has executed the pro- 
gram. We then encode all execution paths into one for- 
mula, which is finally fed into the SMT solver. However, 
the number of threads and context switches can grow very 
large quickly, and easily "blow-up" the solver. Given this, 
there is a clear trade-off between usage of time and memory 
resources to model check multi-threaded software. 

Figure |4] illustrates our schedule recording encoding ap- 
plied to the example in Figure [J] Since control-flow tests 
cannot influence the state (as the front-end hoists side-effects 



out of the tests), we only need to add guards to effective 
statements, i.e., assignments and assertions. Similarly, we 
only need to record effective context switches (ECS), i.e., 
context switches to an effective statement. These are shown 
as dashed arrows in Figurel?] Finally, we define an ECS block 
as a sequence of program statements that are executed with 
no intervening ECS, and give each block a number. Each 
effective program statement is then prefixed by a schedule 
guard tsi = j where i is the ECS block number and j is 
the thread identifier. Its intuitive interpretation is that the 
guarded statement can only be executed if thread j is sched- 
uled in the i-th ECS block. The value of tSi is set by the 
SMT solver, and determines the order in which the program 
statements are executed. For example, the guard at Ty2 
thus encodes that Yi can only be executed if Ty runs in the 
first ECS block. Note that schedule guards are only nec- 
essary but not sufficient conditions for the execution of a 
statement. For example, Tyi has the same guard as Ty2, 
but Yo cannot be executed using any viable schedule. The 
guards can also be combined conjunctively and disjunctively 
to encode more involved schedules. For example, the guard 
of both Txi and Tx2 corresponds to a schedule in which Ty 
ran before switching to Tx- 



START_THREAD 
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ts1 = = 


2 && ts2 = = 1 
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[i] = 10; 
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Tx2 


ts1 = = 


2 && ts2 = = 1 


-> a 


ssert( 


> = && i<N); 


> 







E N D_TH RE AD 



"end_thread j 



_^ control-flow 

effective context 
switch (ECS) 



Figure 4: Schedule encoding of the example in Fig- 
ure [H 

The schedule guards are added by the front-end when pro- 
gram statements are executed symbolically and become part 
of the produced verification conditions. The thread selec- 
tion variable is a free variable that the SMT solver will try 
to instantiate with all possible concrete values. The thread 
number value is a constant that corresponds to the thread 
identifier. As an example, if the SMT solver chooses ts\ = 2 
and tS2 = 1, then the program statement Xo,Xi,yo, and 
Y\ are all enabled in principle, but which ones are executed 
depends on the values of the control-flow tests x > 2 and 
a; > 3. Note that the ordering of statements within a thread 
is of course still ensured by the program order semantics, so 
that X\ will not be executed before Xq. Consequently, all 
the combinations of the thread selection variables will pro- 



duce only two different interleavings as follows: {Vi} and 
{Yi,Xo,Xi} (cf. Figure Ob)). 

Given this, we can define a schedule SCH to determine 
which interleavings will be considered and encode the guards 
in formula ([T} as; 



constraints 



property 



= I (so) A /\ /\ 7^. (s„ s,+ i) A /\ P, (Sfe) A /\ SCH (s,) 



(2) 

where y'j represents the modified transition relation incor- 
porating the schedule guards added by the front-end and 
SCH {si) represents a constraint on the schedule. If we do 
not add any constraints, then Ato^ SCH (si) = true and aU 
possible interleavings are considered. However, if we want 
to apply more aggressive POR techniques, we can add con- 
straints to SCH in order to force the removal of interleavings 
that do not contribute to checking a given property. In our 
running example, we can add the constraint tsi — 2 and 
tS2 = 1 to remove the interleaving {Yi} (see Figure [3fb)), 
which does not contribute to check the assertion in line 10 
of Figure [T] 

2.3 UW Approach 

The core idea of the under-approximation and widening 
(UW) approach is to consider a series of under-approximations 
of a given model by encoding additional literals into the 
verification condition t/j and by extracting the proof objects 
generated from an SMT solver 11 . We define tp' as an un- 
derapproximated model of tp, i.e., ip' = tp /\ f\^^ U where 
li,l2, ■ ■ ■ ,ln are additional literals that guard the program 
statements of each thread. Similar to the schedule guards 
described in the previous section, these literals also control 
the symbolic execution: a program statement is executed 
only if the literal and its corresponding guard are enabled. 
Therefore, we can see that if t/j is unsatisfiable, then ip' is 
also unsatisfiable, i.e., there is no assignment to the literals 
l\,l2, ■ ■ ■ ,ln that make satisfiable. However, it is possible 
that is satisfiable while ij)' is not, due to the additional lit- 
erals. Thus, tj}' can be thought of as an underapproximation 
of tj} and each satisfying assignment of ip' is also a satisfying 
assignment to i/;. These additional literals then allow us to 
guide the widening process according to the variables that 
participate in the proof of unsatisfiability produced by the 
SMT solver. In the formal description, we rewrite formula 
© as 



constraints 



property 



= / (so) A /\ /\ 7^ {s^, s,+i) A /\ Pj (s,) A /\ /\ 
j=i 1=0 j=i ^ieT^^jei 

(3) 

where hj £ L are literals that encode the program state- 
ments of each thread. We denote the set of threads by T, 
the set of program statements 5", and the set of control lit- 
erals by L. In the example of Figure [3 T = {Tx,Ty}, 
S — {Xo,Xi,Yo, Yi}, and L = {Ixg, Iyq}- Note that the way 
that we encode the underapproximation differs from [16] . 
The authors in [16] encode an underapproximation using 
m X n control literals, where m is the number of control 
points that guard each program statement and n is the num- 



ber of processes. In our encoding, we use the same guards 
as in the schedule recording approach as control literals, we 
use Ixo = {tsi = 2) && (iS2 = 1) and Iyo = (isi = 2). We 
also use information from the front-end (as described in Sec- 
tion [TTJ to reduce substantially the number of control liter- 
als required. If we were to include a control literal for each 
statement as in [TB], then our solution might not scale in 
practice to large software systems. 

The main difference between the schedule recording and 
the UW approaches is that schedule remains fixed and is by 
default set to true while the UW model is updated based 
on the information extracted from the proof and is initially 
set to false. The widening process then works as follows. 
Initially, each literal in L is set to be false, because we aim 
to minimize the number of interleavings. At every state, we 
only consider the thread with the smallest index that has 
enabled transitions and only expand those. In our running 
example, we first execute program statement Yi because the 
tests X > 2 and a; > 3 are false and consequently no other 
thread has enabled transitions. As the global variable x 
is set initially to 2 (line 3), at the first step we consider 
that only program statement Yi is expanded from the ini- 
tial state and build formula ([Sjl by encoding Yi statement as 
Iyq (a; = 3). After that, we invoke the SMT solver to ex- 
tract the unsatisfiable core and check that Iyq participated 
in the proof of unsatisfiability. In the second iteration of our 
algorithm, we remove Iyq from L in order to continue to the 
next iteration so that Iyq can now become either true or false 
(while the others must remain false). Afterwards, we exe- 
cute symbolically program statements Xo;Xi;Yo and build 
formula Q. We check that Ixq participated in the proof of 
unsatisfiability. At the next iteration, we remove Ixq from 
L, execute symbolically program statements Yo;Xo;Xi and 
build formula ([3]). At this iteration, we have found a vio- 
lation of the property and the UW procedure terminates; 
otherwise the procedure would continue until none of the 
additional literals in L participate in the proof of unsatis- 
fiability. It means that the procedure does not rely on the 
underapproximation itself and concludes that the property 
holds. 



Access Relations 


Read-read 


Read-write 


Write- write 


Same variable 


Equivalent 


Non-equivalent 


Non-equivalent 


Different variables 


Equivalent 


Equivalent 


Equivalent 



Table 1: Read-write analysis of interleavings equiv- 
alence between visible instructions. 



safely merged into one. In order to implement RW-POR, we 
compute the sets of variables written (WRj) and read (RDj) 

by each of the threads. If WR, n (^Ufc^^t U WRk^ = 

and RDj n Ufc^^t WRk = 0, i.e., if the intersection between 
the set of visible variables that are written and read by 
thread j and all other threads is empty, then we only ex- 
plore the successors generated by executing j and all other 
transitions can be safely ignored. 

There are six possible combinations of visible instructions 
of different threads, as shown in Table [1] There are three 
particular situations to consider when we generate the inter- 
leavings: (i) two read operations will not modify the state, 
so they will always generate equivalent interleavings, (it) 
two program statements accessing different variables are in- 
dependent w.r.t. their execution states, thus these two pro- 
gram statements always generate equivalent interleavings 
with both execution orders, (iti) two instructions access- 
ing same variable (i.e., with read-write and write-write re- 
lations) will generate non-equivalent interleavings. In these 
cases, the read-write relation actually causes read- write races 
and the write-write relation causes the write-write races. 
In summary, only two types of relations will generate non- 
equivalent interleavings, while all other four types of rela- 
tions generate equivalent interleavings. Those redundant 
interleavings are simply removed in our approach. 

Both PORs described above work best in conjunction with 
an alias analysis. However, at this point in our work, we 
do not have one implemented. We thus assume that the 
actual thread parameters are not aliased to global variables 
or to each other. In addition, we do not remove redundant 
interleavings originating from pointer aliasing. 



2.4 Partial Order Reduction 

In the modelling of multi-threaded software, we consider 
that any of the threads j £ T is able to make a transition 
and then we have to compute all states for which a thread 
j exists, (i.e., AJLi 7j (sii •Si-l-i))- The problem is that the 
number of states to be explored can grow dramatically with 
the number of program statements and threads. The pur- 
pose of the Partial-Order Reduction (POR) technique [I] [8l 
1151 122] is to reduce the number of states that have to be 
explored. This is done in a way that if the property holds 
on the reduced model, it also holds on the original model. 

In our SMT-based BMC framework, as threads communi- 
cate only through global variables, we apply partial order re- 
duction (POR) techniques at two levels in our algorithm. At 
the first level, we apply the visible instruction analysis POR 
(VI-POR) 22 , which removes the interleavings of instruc- 
tions that do not affect the global variables, i.e., we remove 
transitions which are independent from transitions made by 
any other thread. An instruction is visible if it accesses a 
global variable, and it is invisible otherwise. At the second 
level, we apply the read-write analysis POR (RW-POR) [8] 
in which two (or more) independent interleavings can be 



3. MODELLING SYNCHRONIZATION 
PRIMITIVES IN PTHREAD 

This section presents our modelling of the synchronization 
primitives of the Pthread library [21]. We assume that the 
library function implementations are correct and focus our 
effort only on verifying client programs that use them. We 
thus provide an instrumented model of the Pthread func- 
tions and use this to model check the client code. We show, 
in our experiments, that our modelling is able to detect in- 
correct use of the functions and is also able to detect blocking 
operations that can lead to global deadlocks. 

3.1 Modelling Mutex Locking Operations 

The Pthread library supports two functions to implement 
mutual exclusion between threads, pthread_mutex_lock and 
pthread_mutex_unlock. The argument to these functions is 
a C data structure called mutex that, in our modelling, has 
two states, "locked" and "unlocked". The pthread_mutex Jock 
locks the mutex if it is unlocked; otherwise it blocks the 
current thread until the mutex is released and can then be 
locked successfully again. The pthread_mutex_unlock un- 
locks the mutex that was locked previously by the same 



thread. 

Execution paths are considered to be blocked on a mu- 
tex when the thread tries to lock a mutex that has already 
been locked by other threads. Such blocking paths are also 
called non-wait-free paths. In order to model mutex oper- 
ations, we apply the notion of wait-free paths as proposed 
initially in [24]. However, in contrast to [24], our approach is 
able to model check multi-threaded programs that make use 
of mutexes, can handle more than two threads, can detect 
deadlocks, and does not require the user to run the model 
checker twice in order to detect different types of bugs ("reg- 
ular" and concurrency bugs). 

To explain how mutexes are encoded in our SMT-based 
BMC framework, we consider the example in Figure [5] In 
this example, both threads Ta and Tb lock and unlock the 
same mutex m. The execution paths Ao;Ai;Bo;Bi and 
Bo;Bi; Ao;Ai are unblocked while the others are blocked 
paths. However, instead of blocking the execution paths 
starting with Aq; Bq and Bo;Ao, we simple ignore the state 
of the mutex, so that we do not block the remaining instruc- 
tions, and just lock it (again). In pthreadjmutexjanlock, 
we simply check if the mutex is already locked and if so, we 
release the lock; otherwise, we have detected an error. 



START_THREAD START_THREAD 




END_THREAD END_THREAD 



Figure 5: Execution paths blocking on a mutex. 

This modelling is sufficient to find bugs related to data 
races. However, it is not able to detect deadlocks. In order 
to detect global deadlock situations caused by the wrong use 
of the mutexes, we need to look in more detail at the possible 
states that a thread can be in with our modelling: (i) Join 
state: The thread is waiting for thread termination; (ii) Lock 
state: The thread is waiting for a mutex to be unlocked; (Hi) 
Wait state: The thread is waiting for a signal or broadcast 
to wake up; (iv) Exit state: The thread has already exited; 
(v) Free state: The thread is not in any of the above four 
states and is free to execute its instructions. A thread is 
blocked if it is in one of the join, lock or wait states, and 
is supposed to be running if it is not in exit state. Global 
deadlock occurs if there is no running thread in the free state, 
i.e., the number of blocked threads is equal to the number 
of running threads. In order to model deadlock, counts of 
both blocked threads and running threads are maintained 
with global variables. Figure [6] presents our modelling of 
pthread_mutex_lock to detect global deadlock with mutexes. 
We define mutex_lock_field and mutex_count_field as a C 
macro in lines 1 and 3 respectively. 

We use the count field of the pthread-mutex_t data struc- 
ture to count the number of threads that are in the lock 
state due to this mutex, and trds-irurun to check the global 
number of threads that are currently running. Initially, the 
mutex is unlocked and we only lock it after the first call to 
pthreadjmutexjock. In subsequent calls, we increase the 
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IS 
19 
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mutex_count_fieId (*mutex) ; 


24 






deadlock = ( mut ex_count_field (* mutex ) 
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assert ( deadlock ) ; 
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atomic_end ( ) ; 
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return 0; 
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} 






32 


int 


pthread_mutex_unlock( pt hrcad_mut ex_t * mutex ] 


33 


{ 




34 




atomic_begin () ; 


35 




assert ( mute x_Iock_field (* mutex ) ) ; 


36 




mutex_lock_field (*mutex) = 0; 


37 




atomic_end ( ) ; 


38 




return 0; 


39 


} 







Figure 6: Modelling mutex lock and unlock opera- 
tions to detect global deadlock. 

count field, allow context switches, check if the mutex was 
unlocked, and then assert count < trdsjinjrun. If the as- 
sertion fails, a global deadlock was detected (i.e., a thread 
is blocked by a lock operation on a mutex and the required 
mutex never gets unlocked by the thread that owns it, either 
because the locking thread has exited or because it has been 
blocked by another operation). If the assertion holds, we 
then eliminate this execution as described above. The mod- 
elling of the pthread_mutex_unlock, which is similar to [24] . 
is shown at the bottom of Figure [6] 

3.2 Modelling Conditional Waiting 

In the Pthread library, we consider functions from con- 
ditional waiting: pthread-cond-wait, pthread_cond-signal, 
and pthread_cond_broadcast. The arguments to the func- 
tion pthread_condjwait are two data structures called cond 
and mutex where, in our modelling, cond has also two states, 
"locked" and "unlocked". The others functions have only 
the argument cond. Our modelling of the conditional wait- 
ing operation also employs the notion of wait-free execution 
paths. The function pthread_condjwait is used to block the 
thread on a condition variable and the blocked thread is 
awakened only if another thread calls signal or broadcast. 
If there are several threads that are blocked on a condition 



variable, then the pthread-condsignal call unblocks at least 
one of them (but there is no guarantee of which one will be 
woken up due to the scheduling policy) while the function 
pthread_cond_broadcast call unblocks all threads currently 
blocked on the specified condition variable. 

Figure [7] shows our modelling for the wait operation prim- 
itive. We consider that initially there is no deadlock (see line 
4) and whenever a thread calls pthread_condjwait, we atom- 
ically lock the condition variable cond, assert that the mutex 
is currently locked, and then release the mutex so that other 
threads that access that mutex can make progress (i.e., wait- 
free execution). Afterwards, we allow context switches and 
we then check whether the number of threads in wait state 
(i.e., threads that are waiting for a signal or broadcast to 
wake up) is less than the total number of the threads that 
are currently running. 
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int 


pthread_cond_signal(pthread_cond_t * cond ) 


28 


{ 




29 




atomic_begin (); 


30 




cond_Iock_f ield (*cond)=0; 


31 




cond_nwaiters_field(* cond) ; 


32 




atomic_end ( ) ; 


33 




return 0; 


34 


} 







Figure 7: Modelling conditional waiting and signal 
operations to detect global deadlock. 

In order to model signal operations, we simply release the 
condition variable and decrement the number of threads that 
were locked due to the specified condition variable. The 
modelling of the conditional signal operation is shown in 
Figure [7] as well. 

In order to model broadcast operations, we create an addi- 
tional global variable called broadcast_id, which records the 
number of broadcast operations that have executed and also 
gets incremented inside the function pthread_condJbroadcast. 
In the wait operation the thread firstly records the current 
broadcasted and is then forced to make context switches to 
other threads. When the context is switched back to the 



current thread, an assertion checks if a broadcast operation 
has occurred by checking whether the current value of vari- 
able broadcastjid is greater than the recorded broadcastJ,d. 
The deadlock is detected if there is no execution path with 
broadcast operations. 

4. EXPERIMENTAL EVALUATION 

We have implemented the lazy, schedule recording, and 
UW approaches described in Section [5] in our ESBMCQ (Ef- 
ficient SMT-Based Bounded Model Checker) tool that sup- 
ports the SMT logics QF_AUFBV as well as QF_AUFLIRA 
from the SMT-LIB 26 . In our experiments, we have used 
ESBMC vl.3 together with the SMT solver Z3 HI]- 

The experimental evaluation of our work consists of two 
parts. In Section 14.11 we compare our approaches against 
the Monotonic Partial Order Reduction (MPOR) [S] and 
Peephole Partial Order Reduction (PPOR) [23 that are im- 
plemented in a SMT-based bounded model checker using 
the Yices SMT solver \X2, - In Section [4.21 we compare our 
approaches against SATABS version 2.4 [7j connected to Ca- 
dence SMV [15], which is a state-of-the-art C model checker 
and supports the verification of multi-threaded software with 
shared variables using the CEGAR technique. All experi- 
ments were conducted on an otherwise idle Intel Xeon 5160, 
3GHz server with 4 GB of RAM running Linux OS. For all 
benchmarks, the time limit has been set to 3600 seconds for 
each individual property. All times given are wall clock time 
in seconds as measured by the unix time command through 
a single execution. 

4. 1 Comparison to MPOR and PPOR 

We use the dining philosophers model to evaluate our 
approaches against MPOR and PPOR. Since the bench- 
marks used in [1^ are not available, we re-implemented 
them as described there. An implementation is available 
at users.ecs.soton.ac.uk/lcc08r/esbmc. Each philoso- 
pher has its own local variables, and they communicate only 
through a global shared array of forks. This version guar- 
antees the absence of deadlocks. As in |18) . we also check 
two properties: (i) whether all philosophers can eat simul- 
taneously (this property does not hold, i.e., the verification 
condition is unsatisfiable) and (ii) whether all philosophers 
have eaten at least once (this property holds, i.e., the verifi- 
cation condition is satisfiable). The authors in [18J run their 
experiments on a workstation with 2.8 GHz Xeon processor 
and 4GB of RAM memory running Linux OS. In order to 
make the results comparable, we scale their times in Table[2l 
We give both original (in brackets) and scaled timings. 

Table [2] shows the detailed results of the comparison be- 
tween MPOR, PPOR, and the three ESBMC approaches. 
The first column #L gives the number of lines of code, while 
the second column #T reports the total number of threads. 
The Time column provides the time in seconds while the col- 
umn #1 provides the total number of generated interleavings 
and the column #IF the total number of failed interleavings. 
The column Iter gives the number of iterations to prove or 
disprove the property in the UW approach. 

As we can see in Table (2] our approaches perform equiva- 
lently to MPOR to check the first property of the model until 
we set the number of philosophers to 5. If we continue in- 
creasing the number of philosophers, MPOR performs better 

^Available at http://users.ecs.soton.ac.uk/lcc08r/esbmc/ 
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Table 2: Results of the comparison between MPOR, PPOR, lazy, schedule, and UW ESBMC 



than our approaches. However, our three approaches per- 
form better than PPOR to check the first property. In ad- 
dition, our lazy ESBMC scales significantly better than the 
other approaches to check the second property of the dining 
philosophers model, i.e., whether all philosophers have eaten 
at least once. We also show in column #I/#IF that all inter- 
leavings generated by our lazy ESBMC are satisfiable. Our 
UW and schedule ESBMC also performs better than MPOR 
and PPOR until we set the number of philosophers to 6. In 
summary, our lazy approach outperforms both MPOR and 
PPOR for those benchmarks that generate satisfiable for- 
mulae and is still comparable to MPOR and PPOR when 
the generated formulae are unsatisfiable. 

4.2 Comparison to SATABS 

In order to evaluate our approaches against SATABS, we 
used a number of multi-threaded programs taken from stan- 
dard benchmark suites (see Table [3]). Programs 1-12 are an 
implementation of the dining philosophers as described in 
Section [4.11 In the dining philosophers implementation, we 
set the number of philosophers (threads) to 2,3, ... ,7 and 
compare the runtime performance of the three approaches 
against SATABS. The programs 13-22 are taken from the 
benchmark suite of the INSPECT tool gS]. This suite con- 
tains programs with two or more threads as well as mutex 
and condition synchronization primitives from the Pthread 
library. The programs 23 and 24 are taken from the Helgrind 
benchmark suite [2] and they contain concurrency bugs re- 
lated to lock and unlock operations. It is important to note 
that most of these benchmarks contain data dependencies 
among the threads (i.e., the threads access the global vari- 
ables) . 

Table [3] shows the detailed results of the comparison be- 
tween UW, Lazy, and schedule ESBMC as well as SATABS. 
We do not run the programs 19-22 with SATABS because 
it does not support the condition synchronization primitive. 
It is also important to point out that the verification times 
of the programs 1-12 in Table |3] differ from Table [2] because 
instead of checking a single property, here we check prop- 
erties related to mutex operations and array bounds, which 
can be automatically generated by both tools, SATABS and 
ESBMC. Hence, the column #P gives the number of prop- 
erties to be verified for each multi-threaded C program. The 
Time column provides the time in seconds to check all prop- 



erties of a given program and Failed indicates how many 
properties failed during the verification process. Here, prop- 
erties can fail for two reasons: either due to a time out (TO) 
or due to memory out (MO). 

As we can see in Table |3l our lazy ESBMC approach 
performs significantly better than the other approaches on 
benchmarks that contain bugs (i.e., the formula sent to the 
SMT solver is satisfiable). However, if there is no bug in the 
benchmark, then our schedule ESBMC approach performs 
better than the UW and lazy ESBMC, but not as good as 
SATABS for the dining philosophers benchmark. This indi- 
cates that our SMT-based BMC procedures do not scale well 
for problems of increasing complexity, i.e., for a large num- 
ber of threads and data dependencies among the threads. 
However, SATABS times out for programs 17 and 18, and 
provides false results for programs 7-12, 15, 23, and 24, of 
which the last two contain deadlocks due to the incorrect use 
of lock and unlock operations. Based on that, we conclude 
that SATABS does not seem to explore all interleavings and 
also does not add additional checks for detecting deadlocks, 
which explains the better scaling for the dining philosophers 
benchmark. 

We can see that our UW ESBMC algorithm outperforms 
SATABS in most of the multi-threaded programs from Ta- 
ble [S] except for the programs 5, 6, 10, 11, and 12. How- 
ever, in these programs SATABS provides false results as 
discussed above. In any case, it is important to note that 
when we enabled the proof generation feature of the SMT 
solver to extract the unsatisfiable cores, we always observed 
memory overhead and corresponding slowdowns, as also re- 
ported previously in [TD]. Additionally, we observed that 
the performance of the UW ESBMC procedure can be sig- 
nificantly improved if we use heuristics to update the set of 
additional literals in L to be used at the next iteration of 
the algorithm. However, at this point in time, we do not 
investigate further alternative ways of updating the set L. 
We set the maximum size of the unsatisfiable core to contain 
500 control literals since the SMT solver Z3 fails with a seg- 
mentation fault when there are thousands of literals. This 
situation occurs only with the dining philosophers model 
when we set the number of philosophers to 6 or more. We 
reported this bug to the Z3 developers and they were already 
aware of this problem. 
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5. RELATED WORK 

SMT-based BMC is gaining popularity in the formal veri- 
fication community due to the advent of sophisticated SMT 
solvers built over efficient SAT solvers 11 . Ganai and Gupta 
describe a verification framework for BMC which extracts 
high-level design information from an extended finite state 
machine (EFSM) and apply several techniques to simplify 
the BMC problem [13]. However, the authors use only the 
theory of integer and real arithmetic, which does not re- 
flect precisely the ANSI-C semantics. Armando et al. also 
propose a BMC approach using SMT solvers for ANSI-C 
programs [5], but they only make use of linear arithmetic, 
arrays, records and restricted bit-vectors arithmetic and, as 
a consequence, their SMT-CBMC prototype does not ad- 
dress important constructs of the ANSI-C language. 

Qadeer and Rehof present a pragmatic method to discover 
bugs in concurrent software in which the program analysis is 
restricted to executions with a bounded number of context 
switches However, this method is incomplete since it 
considers the verification up to a given fixed context bound. 
In addition, the authors do not apply it to realistic and large 
concurrent software benchmarks and the integration of this 
context-bounded model checking algorithm into the explicit 
state model checker ZING 0] is left for future work. Ra- 
binovitz and Grumberg describe an extension of CBMC to 
concurrent C programs [21] , which translates C threads into 
SSA form and adds constraints for a bounded number of 



context-switches, as described in [3]. This approach, how- 
ever, is limited to two threads and it requires additional 
constraints to bound the number of context switches and 
allowed interleavings into the formula to be sent to a SAT 
solver. 

Ganai and Gupta describe a lazy method for modelling 
multi-threaded concurrent systems using shared variables [14] , 
but this method is restricted to two threads. Gupta et al. [TS] 
extend [141 117] by supporting more than two threads and 
by combining dynamic partial order reduction with sym- 
bolic state space exploration. However, this method is in- 
complete since it considers the concurrency semantics up to 
the bounded depth as in [JJ [53]. Grumberg et al. propose 
an algorithmic method based on SAT and BMC to model 
check a multi-process system based on a series of under- 
approximated models [16]. This approach, however, does not 
integrate partial order reduction algorithms to reduce redun- 
dant interleavings and it does not address the problem of 
model checking real-world embedded software in multi-core 
environments. 

To the best of our knowledge, there is no work that consid- 
ers a comprehensive SMT-based BMC formulation to verify 
multi-threaded software using a set of under-approximations 
and widening models as well as the integration of partial 
order reduction algorithms into the UW framework. In con- 
trast to [141 124] . our method can handle more than two 
threads and can detect deadlock caused by the mutexes and 



conditions operations. Our main contribution is an algo- 
rithmic method and corresponding tools to verify multi- 
threaded software using SMT in order to combat the ver- 
ification complexity. 

6. CONCLUSIONS AND FUTURE WORK 

Despite the large body of (theoretical) research in the ver- 
ification of concurrent systems, there are only few tools that 
analyze multi-threaded programs with shared variables. In 
this work, we presented an extension of the ESBMC model 
checker to support the verification of multi-threaded soft- 
ware with shared variables, mutexes and conditions using 
an SMT-based BMC framework. We also described three 
approaches UW, lazy and eager SMT-based BMC imple- 
mented with partial-order reduction methods in which the 
final formula is well suited for using with the SMT solvers. 
Our experimental results show that our UW ESBMC ap- 
proach outperforms the CEGAR approach implemented in 
the SATABS model checker. With the addition of deadlock 
detection in our modelling, we can find bugs that other pre- 
vious approaches are not able to find. Moreover, our lazy 
ESBMC, which adds concurrency constraints lazily and in- 
crementally, is able to find bugs quickly in non-trivial bench- 
marks. In future, we would like to explore in more depth 
the partial-order reduction methods, configure ESBMC for 
compatibility with any given compiler to break statements 
with multiple global variables, and investigate heuristics to 
update the set of additional literals in our UW ESBMC al- 
gorithm. 
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